theoretical limit
Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning
We investigate the theoretical limits of pipeline parallel learning of deep learning architectures, a distributed setup in which the computation is distributed per layer instead of per example. For smooth convex and non-convex objective functions, we provide matching lower and upper complexity bounds and show that a naive pipeline parallelization of Nesterov's accelerated gradient descent is optimal. For non-smooth convex functions, we provide a novel algorithm coined Pipeline Parallel Random Smoothing (PPRS) that is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension. While the convergence rate still obeys a slow $\varepsilon^{-2}$ convergence rate, the depth-dependent part is accelerated, resulting in a near-linear speed-up and convergence time that only slightly depends on the depth of the deep learning architecture. Finally, we perform an empirical analysis of the non-smooth non-convex case and show that, for difficult and highly non-smooth problems, PPRS outperforms more traditional optimization algorithms such as gradient descent and Nesterov's accelerated gradient descent for problems where the sample size is limited, such as few-shot or adversarial learning.
Semantic categories of artifacts and animals reflect efficient coding
Zaslavsky, Noga, Regier, Terry, Tishby, Naftali, Kemp, Charles
It has been argued that semantic categories across languages reflect pressure for efficient communication. Recently, this idea has been cast in terms of a general information-theoretic principle of efficiency, the Information Bottleneck (IB) principle, and it has been shown that this principle accounts for the emergence and evolution of named color categories across languages, including soft structure and patterns of inconsistent naming. However, it is not yet clear to what extent this account generalizes to semantic domains other than color. Here we show that it generalizes to two qualitatively different semantic domains: names for containers, and for animals. First, we show that container naming in Dutch and French is near-optimal in the IB sense, and that IB broadly accounts for soft categories and inconsistent naming patterns in both languages. Second, we show that a hierarchy of animal categories derived from IB captures cross-linguistic tendencies in the growth of animal taxonomies. Taken together, these findings suggest that fundamental information-theoretic principles of efficient coding may shape semantic categories across languages and across domains.
- North America > United States > California > Alameda County > Berkeley (0.14)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- Oceania > Australia (0.04)
- (4 more...)
On Theoretical Limits of Learning with Label Differential Privacy
Zhao, Puning, Ma, Chuan, Shen, Li, Wang, Shaowei, Fan, Rongfei
Label differential privacy (DP) is designed for learning problems involving private labels and public features. While various methods have been proposed for learning under label DP, the theoretical limits remain largely unexplored. In this paper, we investigate the fundamental limits of learning with label DP in both local and central models for both classification and regression tasks, characterized by minimax convergence rates. We establish lower bounds by converting each task into a multiple hypothesis testing problem and bounding the test error. Additionally, we develop algorithms that yield matching upper bounds. Our results demonstrate that under label local DP (LDP), the risk has a significantly faster convergence rate than that under full LDP, i.e. protecting both features and labels, indicating the advantages of relaxing the DP definition to focus solely on labels. In contrast, under the label central DP (CDP), the risk is only reduced by a constant factor compared to full DP, indicating that the relaxation of CDP only has limited benefits on the performance.
Reviews: Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning
The relationship between the proposed pipeline parallel optimization setting and existing work is not clear. Does it contain related work as special cases? The authors mentioned in the abstract that the presented study is distributed per-layer instead of per-sample. It could be helpful to give additional comparison along this line. This was briefly touched in Section 2 on asynchronous value/gradient evaluation.
Reviews: Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning
The reviewers agreed that this paper is a nice contribution to the literature and provides interesting and potentially useful convergence results in the framework of pipeline parallel optimization. The reviewers were impressed by the rebuttal and encourage the authors to incorporate the clarifications therein into the paper.
Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning
We investigate the theoretical limits of pipeline parallel learning of deep learning architectures, a distributed setup in which the computation is distributed per layer instead of per example. For smooth convex and non-convex objective functions, we provide matching lower and upper complexity bounds and show that a naive pipeline parallelization of Nesterov's accelerated gradient descent is optimal. For non-smooth convex functions, we provide a novel algorithm coined Pipeline Parallel Random Smoothing (PPRS) that is within a d {1/4} multiplicative factor of the optimal convergence rate, where d is the underlying dimension. While the convergence rate still obeys a slow \varepsilon {-2} convergence rate, the depth-dependent part is accelerated, resulting in a near-linear speed-up and convergence time that only slightly depends on the depth of the deep learning architecture. Finally, we perform an empirical analysis of the non-smooth non-convex case and show that, for difficult and highly non-smooth problems, PPRS outperforms more traditional optimization algorithms such as gradient descent and Nesterov's accelerated gradient descent for problems where the sample size is limited, such as few-shot or adversarial learning.
Robust Anthropomorphic Robotic Manipulation through Biomimetic Distributed Compliance
The impressive capabilities of humans to robustly perform manipulation relies on compliant interactions, enabled through the structure and materials spatially distributed in our hands. We propose by mimicking this distributed compliance in an anthropomorphic robotic hand, the open-loop manipulation robustness increases and observe the emergence of human-like behaviours. To achieve this, we introduce the ADAPT Hand equipped with tunable compliance throughout the skin, fingers, and the wrist. Through extensive automated pick-and-place tests, we show the grasping robustness closely mirrors an estimated geometric theoretical limit, while `stress-testing' the robot hand to perform 800+ grasps. Finally, 24 items with largely varying geometries are grasped in a constrained environment with a success rate of 93%. We demonstrate the hand-object self-organization behavior underlines this extreme robustness, where the hand automatically exhibits different grasp types depending on object geometries. Furthermore, the robot grasp type mimics a natural human grasp with a direct similarity of 68%.
Blue and Green-Mode Energy-Efficient Chemiresistive Sensor Array Realized by Rapid Ensemble Learning
Wang, Zeheng, Cooper, James, Usman, Muhammad, van der Laan, Timothy
The rapid advancement of Internet of Things (IoT) necessitates the development of optimized Chemiresistive Sensor (CRS) arrays that are both energy-efficient and capable. This study introduces a novel optimization strategy that employs a rapid ensemble learning-based model committee approach to achieve these goals. Utilizing machine learning models such as Elastic Net Regression, Random Forests, and XGBoost, among others, the strategy identifies the most impactful sensors in a CRS array for accurate classification: A weighted voting mechanism is introduced to aggregate the models' opinions in sensor selection, thereby setting up wo distinct working modes, termed "Blue" and "Green". The Blue mode operates with all sensors for maximum detection capability, while the Green mode selectively activates only key sensors, significantly reducing energy consumption without compromising detection accuracy. The strategy is validated through theoretical calculations and Monte Carlo simulations, demonstrating its effectiveness and accuracy. The proposed optimization strategy not only elevates the detection capability of CRS arrays but also brings it closer to theoretical limits, promising significant implications for the development of low-cost, easily fabricable next-generation IoT sensor terminals.
- Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.68)
- Energy > Oil & Gas (0.68)
The theoretical limits of biometry
Biometry has proved its capability in terms of recognition accuracy. Now, it is widely used for automated border control with the biometric passport, to unlock a smartphone or a computer with a fingerprint or a face recognition algorithm. While identity verification is widely democratized, pure identification with no additional clues is still a work in progress. The identification difficulty depends on the population size, as the larger the group is, the larger the confusion risk. For collision prevention, biometric traits must be sufficiently distinguishable to scale to considerable groups, and algorithms should be able to capture their differences accurately. Most biometric works are purely experimental, and it is impossible to extrapolate the results to a smaller or a larger group. In this work, we propose a theoretical analysis of the distinguishability problem, which governs the error rates of biometric systems. We demonstrate simple relationships between the population size and the number of independent bits necessary to prevent collision in the presence of noise. This work provides the lowest lower bound for memory requirements. The results are very encouraging, as the biometry of the whole Earth population can fit in a regular disk, leaving some space for noise and redundancy.
The Great Race for Military AI and Quantum Computing Is On
On the second day of the COSM 2021 conference, speakers asked -- with appropriate skepticism -- whether we could ever produce true Artificial General Intelligence (AGI). But the final day of the conference hosted a conversation on the realistically achievable forms of AI and quantum computing that may pose existential threats to modern life. Robert J. Marks, Director of the Walter Bradley Center for Natural and Artificial Intelligence (which hosted COSM) -- also Distinguished Professor of Electrical and Computer Engineering at Baylor University -- spoke first. The title of his 2020 book, The Case for Killer Robots: Why America's Military Needs to Continue Development of Lethal AI, provides an unsubtle hint at his position. Marks thinks that AI will never be "will never be sentient. It will never understand what it is doing. And, currently, it has no common sense."
- North America > United States (0.71)
- Asia > Russia (0.15)
- Asia > China (0.09)
- (3 more...)